Introduction

This document is a report for on the pre-registered experiment https://osf.io/s6vam/. The experiment examined silent failures in a highly controlled setting. Each trial starts with automated steering. In some trials a bias is introduced that causes the vehicle to veer off the road either suddenly (within 1.5 s) or gradually (~ 4 s). Participants are required to keep within the road edges and intervene if they feel that it is necessary. They complete the steering task on two bend radii, sharp (40 m) or gradual (80 m), without distraction or with an easy or difficult distraction.

Terminology

  • The bias is to yaw-rate, so introduces error by causing the vehicle to understeer or oversteer. Effectively, the manipulation biases the mapping between steering angle - which does not change - and yaw-rate, which does change. Therefore we will henceforth call this error Steering Angle Bias (SAB).

  • SAB is introduced in all trials, though in half the trials the SAB is not large enough to cause the vehicle to leave the road (though there will be some drift). It is confusing to call the ‘stay on road’ trials ‘no failure’ trial (or ‘None’ failure type). Instead, the levels of Failure Type will be Sudden (rapidly drifts off road, requiring intervention), Gradual (slowly drifts off road, requiring intervention), and Benign (may experience some drift but it does not require intervention).

  • The different Cognitive Load levels will be Hard (three targets), Easy (one target), or None (no heard letters).

  • Bend levels will be referred to by their radii, 40 m or 80 m.

Technical Errors

This experiment stands as a lesson for the need to conduct a full technical pilot, running a participant through the entire experiment then putting that data through the entire analysis workflow. Due to time pressures we often do some piloting to ostensibly check data saving, condition indexing etc. But we do not very often take the time to process the data through analysis scripts. Here there were two errors in the data saving. First, the distraction performance files were overwritten when the driver was also steering (we still have the baseline - no steering - distraction performance files). Secondly, the “unique” filename for saving individual trials did not include the steering angle bias in the title. So, in effect these trials were overwritten and only the last six (randomised) trials were saved for each radii, irrespective of SAB.

This means we only have 20-30 % of the expected data (Fig 1). However, the amount of trials saved for each condition is random (for driving without distraction we have slightly more because there are two blocks), and 20-30% is still a reasonable chunk of trials (> 40 in each condition). So at the very least the reduced data set will be useful at tweaking the design for an improved re-run, and for developing the modelling architecture.

Fig 1. Amount of trials in each condition

Fig 1. Amount of trials in each condition

Cognitive Load difficulty.

We only have the cognitive performance for baseline (without steering). Here we plot three indices of performance to assess whether our levels of cognitive difficulty are reflected in performance measures: Percentage Correct (PC, True positives + True negatives / total letters heard); Reaction Time (RT, reaction time for True positives); Proportional Absolute Count Error (average distance from the true count, expressed as a proportion of the total amount of targets heard).

Fig 2. Cognitive Task Performance. A) Percentage Correct. B) Reaction Time. C-D) Absolute Count Error

Fig 2. Cognitive Task Performance. A) Percentage Correct. B) Reaction Time. C-D) Absolute Count Error

Fig 2 shows that people respond differently to each Cognitive Load condition. Participants react slower to heard target letters if they are listening for three targets rather than only one target (Fig 2A). Participants rarely make mistakes (i.e. responding to a distractor letter or not responding to a target letter) in the Easy Cognitive Load condition, they make more mistakes in the Hard Cognitive Load condition (Fig 2B).

The interpretation of the the count error (Fig 2C) is more nuanced. The total error (cumulative across all targets) is very similar across both conditions. How you use then use this total count error to take into account that people are responding to more targets in Hard than Easy is the debateable bit. For example, estimating five instead of six occurences (for one target) has the same total error as estimating zero, two, three, for target counts of one, two and three (for three targets). Yet one could argue that the two responses are qualitatively different because in the second case the participant has entirely missed a target but got the other two spot on. To reflect this I’ve chosen to express this measure as a proportional of the total amount of targets heard (Fig 2C). In any case, people are only marginally less accurate on this measure in the Hard cognitive load than for Easy. It seems that the more precise measures of RT and PC are better indicators of differences in performance.

Therefore, a quick eyeball of the baseline cognitive task performance suggests that people reacted slower and experienced greater difficulty in deciding the appropriate response when there were three targets as compared to when there was only one.

We will now see how this gets reflected in steering measures.

Driver Takeover.

The pre-registration specifies two hypotheses relating to taking over control of the vehicle:

  • H1: Participants will react slower to silent failures as cognitive load increases (i.e. more time will elapsed from the steering angle bias to the driver disengaging the automation).

  • H2: Participants will react with less aggression as cognitive load increases, bringing the vehicle back to the centre more slowly (i.e. they will be smoother and make less corrections).

These two hypotheses will be analysed in turn.

H1: Speed of take-over

The pre-registration states that two measures will be calculated:

  • RT from SAB onset to disengagement of the system via a button press (can be negative if they take over before the SAB is introduced). I will refer to this measure as RTtakeover.

  • RT from SAB onset to first steering movement (RTsteer).

For the purposes of a ‘quick’ look at the pilot dataset I will only use RTtakeover for now.

Fig 3. RT takeover across failure types

Fig 3. RT takeover across failure types

Fig 3 shows massive differences in how quickly drivers disengage the automation after SAB onset (using positive RTs only). In the Benign condition drivers disengaged the system in 43 % of trials. Since drivers react at such different speeds across the three failure types the failure type conditions will be plotted separately.

Fig 4. RT takeover across cognitive loads

Fig 4. RT takeover across cognitive loads

Fig 4 shows cognitive load differences in RTtakeover within failure type (using positive RTs only, collapsed across bend radii). For Sudden failures (leftmost in Fig 4) there are clear difference between None, Easy, and Hard. People switch to manual mode quickest when not performing a distractor task; they switch slowest when the distractor task is more difficult. However, note that the effect size here will be pretty small since the peaks are only about .1s apart (modes are 0.59 s, 0.65 s, and 0.72 s respectively) with SDs of approximately .2s. This difference is less pronounced for Gradual failures (middle) and Benign failures (right). In the Gradual Failure and Benign failure conditions the data is spread out over a wider range than in Sudden failures, so the depleted dataset causes problems for interpreting the shape of the distribution and the peaks may be unreliable. Nevertheless, for Gradual failures the median RT for No load seems to be earlier (2.21 s) than the Easy (2.52 s) and Hard (2.45 s) cognitive load conditions.

Fig 5, below, plots RTtakeover by bend radii for each failure type. Differences in RTtakeover between bend radii, if there are any, appear small.

Fig 5. RT takeover across bend radii

Fig 5. RT takeover across bend radii

H1 Summary: of course this needs inferential statistics to support my eyeballing, but I will tentatively suggest that reaction time to silent failures does indeed increase as cognitive load increases, though this effect is small and depends on the severity of the failure.

The next section will examine the steering response after takeover.

H2: Steering Response

Firstly, we plot some trajectories to understand the constraints the driver is placed under. To make the failures unpredictable there are 6 potential automation trajectories, with onset times of the SAB varying between 5 and 9. This variability makes for messy plots if plotting all trajectories in cartesian coordinates. Therefore, Fig 6 fixes the onset time (=5 s, blue star in Fig 6) and the bend radius (=40 m). You can see in Fig 6 that the potential automation trajectories all similarly hug the midline. It is also clear that the Sudden take-overs are tightly bunched compared to the Gradual and Benign (see also the RTs in Fig 4). Note that in the Gradual panel there is one trial where the take-over happens before the SAB is introduced. This happens in 3 % of trials.

Fig 6. Trajectories with a fixed onset time (at 5 seconds) and for bend radius of 40 m. Blue star is the time of onset. Dots are the time of take-over

Fig 6. Trajectories with a fixed onset time (at 5 seconds) and for bend radius of 40 m. Blue star is the time of onset. Dots are the time of take-over

Lane position allows for a better comparison of trajectories (Fig 7). Fig 7 plots the lane position immediately after switching to manual control. On average drivers seem to switch when lane position is closer to the midline in Sudden failures than in Gradual or Benign failures. However, due to the larger SAB drivers take longer to correct for the error in lane position. The fact that drivers tend to switch at a lower lane error in Sudden failures suggests that drivers may be responding to rate of change of lane position (i.e. Time to Lane Crossing, which is higher in Sudden failures) rather than lane position per se.

Fig 7. Steering bias for different failure types. Note that to avoid plotting, oversteer SAB for Benign failures are not shown

Fig 7. Steering bias for different failure types. Note that to avoid plotting, oversteer SAB for Benign failures are not shown

Fig 8 stratifies cognitive loads within failure types. From the average trajectories it appears that drivers tend to correct for errors more quickly when they are not cognitive loaded. This is particularly true for Sudden failures and to a lesser extent Gradual failures. There are not many trials included in the Benign plot so I would caution against drawing any averaged conclusions. Correcting for errors more slowly when cognitively loaded was a feature of Wilkie et al., 2019. However, in that experiment takeovers starting from the same lane position each time. In this experiment if drivers were slower to takeover more error would have accrued, so it is plausible that differences in lane position over time (Fig 8) could be partly due to RT differences between cognitive load conditions (see Fig 4).

Fig 8. Steering bias for cogload within different failure types

Fig 8. Steering bias for cogload within different failure types

To disentangle whether the separation between average trajectories for ‘No Load’ and ‘Loaded’ conditions is due to less aggressive steering in the Cognitively loaded conditions we can take a look at the steering wheel angle plots over time (Fig 9). If the differences in lane position are only driven by delayed RTs we would expect the average SWA traces to be overlapping (i.e. on average, identical steering responses across cognitive load conditions irrespective of small changes in initial lane position). Conversely, we might also expect a reduced (in magnitude) or delayed steering wheel angle response in cognitive load conditions, as per Wilkie et al 2019.

Fig 9. SWA for cogload within different failure types. The dashed line is the point of takeover

Fig 9. SWA for cogload within different failure types. The dashed line is the point of takeover

In no conditions do we see a convincing evidence of a reduced or delayed steering response when cognitively loaded (Fig 9). In Benign and Gradual failures (right and middle, Fig 9) the steering response seems pretty similar across cognitive load conditions, suggesting any separation in the steering bias plot could be due to reaction times.

The Sudden failures (left, Fig 9) require more description. The first thing to note is that the recorded steering wheel angle reaches the maximum yaw-rate. The steering wheel angle is not the true angle, it is re-calculated from a [1, -1] steering wheel value, which hits a limit at 90 degrees. Therefore, for many of the trials participants could be turning the wheel over the 90 degrees mark, yet the inputted yaw-rate into the simulation would be capped. This clearly happens for Sudden failures where there is need for a sharp turn. Despite the wheel angle capping, it is interesting that experiencing Hard cognitive load seems to cause greater wheel turns than Easy and None. It is probable that this is not due to an effect of cognitive load on steering. Rather, it could be the combined effect of slower RTs causing greater steering demand which cannot be wholly compensated due to the capped steering wheel angle. Indeed, disentangling the indirect effects of cognitive load on steering (from increasing RTs and therefore increasing steering demand) with the direct effects of cognitive load on steering will be difficult, as steering demands correlate with RTs. The next plot examines this issue by plotting three measures of steering aggression (SWAmax, SWAvar, and SWAvel next to RTtakeover)

Fig 10. Steering aggression measures correlated with RT. A) Maximum Steering Wheel Angle, B) Standard Deviation of Steering Wheel Angle, C) Average steering wheel velocity

Fig 10. Steering aggression measures correlated with RT. A) Maximum Steering Wheel Angle, B) Standard Deviation of Steering Wheel Angle, C) Average steering wheel velocity

Fig 10 plots the maximum steering wheel angle (SWAmax, Fig 10A) and the standard deviation of steering wheel angle (SWAvar, Fig 10B), and the average steering wheel velocity (SWAvel) for the manual control period of driving. Only trials with at least three seconds of manual driving are plotted, since by this time most drivers have made their initial steering correction (see Fig 9). Trials can have a maximum of 15 seconds of driving. A higher SWAmax would mean that the driver executed a sharper turn. A high SWAvar is often considered a (bad) proxy for wiggly steering. The two measures are correlated since a higher SWAmax usually indicates a higher SWAvar. Steering wheel angle velocity is arguably a better measure for ‘jerkier’ steering as higher SWAvel values would mean a driver turned the wheel more rapidly, but in this particular dataset the measure is partly confounded by capping of the steering wheel angle, where velocity equals zero for the duration of time where the limit is hit.

Reassuringly, similar trends are seen across all measures, suggesting that in our experiment these higher values on all three measures are indicative of steering corrections that are executed with greater magnitude and speed. Here we will refer to such steering corrections as steering demand. Fig 10 is designed to disentangle the relative contribution of RTtakeover and cognitive load to steering demand. It is hypothesised that a positive correlation will exist between RTtakeover and the steering demand measures, since the later the RTtakeover the closer one is to lane boundaries. It is further hypothesised, based on the literature base, that increased cognitive load might cause a reduction in steering demand. If these two hypotheses did not interact, one would see a positive correlation of RTtakeover and steering demand, with vertically separated regression lines for each cognitive load condition. The different failure conditions are discussed in turn:

  • In Sudden failures there is a positive correlation on all measures. In SWAmax and SWAvel the correlation is weaker, presumably due to the capping of steering wheel angle limiting the maximum values for both measures. The coloured regression lines for separate cognitive load conditions are generally overlapping, and when they are not it seems to be due to outlying values (i.e. they are some slow responses in the Hard condition). Therefore, it seems that any change in steering demand cognitive load causing lower RTtakeovers rather than reducing the propensity to correct for errors.

  • In Gradual failures we observe a consistent positive correlation between RTtakeover and steering demand. It’s probable that the trend is easier to see than in Sudden because the rarely hit the clipping limit for Gradual failures (Fig 9), so the measures are less confounded. It is interesting that in all measures the regression line for None sits slightly above the regression lines for Hard and Easy, hinting at a direct effect of cognitive load on steering demand. However, such an effect appears to be small compared to the impact of taking over control later.

  • Note that in Benign failures there are not many trials plotted because trails were excluded if they contained less than three seconds of manual control. Nevertheless, taking over later does not appear to result in greater steering demand. Remember that the Benign failures do not require a takeover, so it is possible that drivers sometimes switch to manual and do not make any corrections. Nor does there appear to be differences between cognitive load conditions in steering demand.

H2 Summary: I will tentatively suggest that there does not appear to be practically significant direct effect of cognitive load on the aggression of the first steering response. I propose that RTtakeover (which seems to fluctuate with cognitive load) is a stronger determinant of the strength of the first steering response.

Summary so Far: what about the mechanism?

TO BE FINISHED ON WEDS 15TH…

So far we have examined the speed of takeover and aggression of the initial steering response. While within each failure type there appear to be some interesting phenomena relating to cognitive load, by far the strongest determinant of RTtakeover is the severity of the failure (i.e. the magnitude of SAB). The most important question seems to be exactly how the magnitude of SAB changes RT~takeover. What perceptual information is the driver responding to? As mentioned in discussion of Fig 7 the drivers do not always intervene at the same lane position. Rather they intervene at earlier lane positions for Sudden failures, suggesting that they may be responding to rate of change information. However, in effect we only have two SAB conditions (as there aren’t many takeovers in the Benign failures) any modelling to determine how drivers are using time to (lane) crossing (TTC) information will be overfitted to this dataset (Jami has done some preliminary development of models in this direction).

Therefore, we recommend massively increasing the levels (e.g. ten levels) of silent failures so we have a decent breadth of SABs (and therefore TTCs) to fit models to. Only using one radii will increase the amount of levels we can assess. To ensure even coverage we will keep the design factorial, this means we can also do inferences on the data between conditions.

Jami has also suggested that we could look model cognitive load as reducing sampling rate. Worth pursuing and I think would mean keeping the cognitive load conditions…

Another design change is increasing sensitivity of the steering wheel to lift the capping confound of the wheel angles…

Other Pre-registered Hypotheses

In the pre-registration there were three more hypotheses. Here I briefly list what we now know.

  • H3: Participants will sample from a more constricted region of the screen for increased cognitive load.

Not looked at yet, will look at if I have time before hols, but do not clearly see how this will change the design?

  • H4: These effects of cognitive load will increase as cognitive load difficulty increases.

For sudden failures RTtakeover appears to be stratified somewhat by Easy or Hard load. However, this effect is small, and smaller still (or nonexistent) in the less severe failures.

  • H5: Effects of cognitive load will become more pronounced when the (steering) task is more difficult (for sharp bends and sudden silent failures).

There does not seem to be an interaction jumping out for bend radii. As stated above the small effects of cognitive load are more pronounced for silent failures.